Introduction

We aim to analyze the number of American citizens killed or wounded through terrorist activities, also looking into the development in time of the terrorist groups activity.  We will also be analyzing the perpetrators group the geographical region where the attacks occured, slide and dice our data based on type of target, type of attacks and type of weapons.

About the data

This dataset on Kaggle about Global terrorism database contains data about date, time, location, number of hostages, killed, wounded, weapons used etc over 135 columns and 181691 rows. The Global Terrorism Database (GTD) is an open-source database including information on terrorist attacks around the world from 1970 through 2017. The database is maintained by researchers at the National Consortium for the Study of Terrorism and Responses to Terrorism (START), headquartered at the University of Maryland.
For the sake of this project, we subset the dataset based on column ‘country’ as ‘United States’. This subset containing information about ‘United States’ consists of 135 columns and 2836 rows.

Prepratory work

Install Packages & Import Libraries

Let’s begin with installing required packages and importing work related libraries.

Read Data

Next, let’s set our working directory and read the dataset into our working environment.

setwd("D:/1st Qtr Study Material/R/Project 3/Project 3")  #set working directory
data = read.csv("USTerrorismData.csv", header = TRUE)     #reading data
nrow(data)                                                #printing number of rows
## [1] 2836
ncol(data)                                                #printing number of columns
## [1] 135
head(data, 2)                                             #printing first 2 rows of data
##       eventid iyear imonth iday approxdate extended resolution country
## 1 1.97001e+11  1970      1    1                   0                217
## 2 1.97001e+11  1970      1    2                   0                217
##     country_txt region    region_txt  provstate    city latitude
## 1 United States      1 North America   Illinois   Cairo 37.00511
## 2 United States      1 North America California Oakland 37.79193
##    longitude specificity vicinity        location
## 1  -89.17627           1        0                
## 2 -122.22591           1        0 Edes Substation
##                                                                                                                                                                                                                                                                                                                                                    summary
## 1 1/1/1970: Unknown African American assailants fired several bullets at police headquarters in Cairo, Illinois, United States.  There were no casualties, however, one bullet narrowly missed several police officers.  This attack took place during heightened racial tensions, including a Black boycott of White-owned businesses, in Cairo Illinois.
## 2                                                                                                    1/2/1970: Unknown perpetrators detonated explosives at the Pacific Gas & Electric Company Edes substation in Oakland, California, United States.  Three transformers were damaged costing an estimated $20,000 to $25,000.  There were no casualties.
##   crit1 crit2 crit3 doubtterr alternative  alternative_txt multiple
## 1     1     1     1         0          NA                         0
## 2     1     1     1         1           2 Other Crime Type        0
##   success suicide attacktype1   attacktype1_txt attacktype2
## 1       1       0           2     Armed Assault          NA
## 2       1       0           3 Bombing/Explosion          NA
##   attacktype2_txt attacktype3 attacktype3_txt targtype1 targtype1_txt
## 1                          NA                         3        Police
## 2                          NA                        21     Utilities
##   targsubtype1                                targsubtype1_txt
## 1           22 Police Building (headquarters, station, school)
## 2          107                                     Electricity
##                            corp1                   target1 natlty1
## 1        Cairo Police Department Cairo Police Headquarters     217
## 2 Pacific Gas & Electric Company           Edes Substation     217
##     natlty1_txt targtype2 targtype2_txt targsubtype2 targsubtype2_txt
## 1 United States        NA                         NA                 
## 2 United States        NA                         NA                 
##   corp2 target2 natlty2 natlty2_txt targtype3 targtype3_txt targsubtype3
## 1                    NA                    NA                         NA
## 2                    NA                    NA                         NA
##   targsubtype3_txt corp3 target3 natlty3 natlty3_txt              gname
## 1                                     NA             Black Nationalists
## 2                                     NA                        Unknown
##   gsubname gname2 gsubname2 gname3 gsubname3
## 1                                         NA
## 2                                         NA
##                                           motive guncertain1 guncertain2
## 1 To protest the Cairo Illinois Police Deparment           0          NA
## 2                                                          0          NA
##   guncertain3 individual nperps nperpcap claimed claimmode claimmode_txt
## 1          NA          0    -99      -99       0        NA              
## 2          NA          0    -99      -99       0        NA              
##   claim2 claimmode2 claimmode2_txt claim3 claimmode3 claimmode3_txt
## 1     NA         NA                    NA         NA               
## 2     NA         NA                    NA         NA               
##   compclaim weaptype1 weaptype1_txt weapsubtype1       weapsubtype1_txt
## 1        NA         5      Firearms            5       Unknown Gun Type
## 2        NA         6    Explosives           16 Unknown Explosive Type
##   weaptype2 weaptype2_txt weapsubtype2 weapsubtype2_txt weaptype3
## 1        NA                         NA                         NA
## 2        NA                         NA                         NA
##   weaptype3_txt weapsubtype3 weapsubtype3_txt weaptype4 weaptype4_txt
## 1                         NA                         NA            NA
## 2                         NA                         NA            NA
##   weapsubtype4 weapsubtype4_txt                   weapdetail nkill nkillus
## 1           NA               NA Several gunshots were fired.     0       0
## 2           NA               NA                                  0       0
##   nkillter nwound nwoundus nwoundte property propextent
## 1        0      0        0        0        1          3
## 2        0      0        0        0        1          3
##                propextent_txt propvalue                      propcomment
## 1 Minor (likely < $1 million)        NA                                 
## 2 Minor (likely < $1 million)     22500 Three transformers were damaged.
##   ishostkid nhostkid nhostkidus nhours ndays divert kidhijcountry ransom
## 1         0       NA         NA     NA    NA                           0
## 2         0       NA         NA     NA    NA                           0
##   ransomamt ransomamtus ransompaid ransompaidus ransomnote hostkidoutcome
## 1        NA          NA         NA           NA                        NA
## 2        NA          NA         NA           NA                        NA
##   hostkidoutcome_txt nreleased
## 1                           NA
## 2                           NA
##                                                                           addnotes
## 1 The Cairo Chief of Police, William Petersen, resigned as a result of the attack.
## 2                            Damages were estimated to be between $20,000-$25,000.
##                                                                                                                                              scite1
## 1                                                                                           "Police Chief Quits," Washington Post, January 2, 1970.
## 2 Committee on Government Operations United States Senate, "Riots, Civil, and Criminal Disorders," U.S. Government Printing Office, August 6, 1970.
##                                                                                                                          scite2
## 1                                       "Cairo Police Chief Quits; Decries Local 'Militants'," Afro-American, January 10, 1970.
## 2 Christopher Hewitt, "Political Violence and Terrorism in Modern America: A Chronology," Praeger Security International, 2005.
##                                                                                                                          scite3
## 1 Christopher Hewitt, "Political Violence and Terrorism in Modern America: A Chronology," Praeger Security International, 2005.
## 2                                                                                                                              
##         dbsource INT_LOG INT_IDEO INT_MISC INT_ANY related
## 1 Hewitt Project      -9       -9        0      -9        
## 2 Hewitt Project      -9       -9        0      -9

Cleaning Data

Considering the large number of attributes, there ought to be some missing data. Thus, to account for empty cells, we replace empty cells with ‘NA’. We see that we had (211454 - 111433) empty cells in our data.

sum(is.na(data))        #Print total number of NA
## [1] 111433
data[data == ""] = NA   #Replace empty cells with NA
sum(is.na(data))        #Print total number of NA
## [1] 211454

Visualizing All Terrorist Attacks on USA

The interactive map shown below allows you to zoom in and out to view the exact location of the attack.
Also, to view details of the attack, click on the red dots!
Note: For this project, we are concerned with attacks only on USA. Therefore, there would not be any red dots for other countries.

#creating map using leaflet function
mapUSA = leaflet() %>% 
  addTiles('http://{s}.basemaps.cartocdn.com/dark_all/{z}/{x}/{y}.png',
          attribution='&copy; 
          <a href="http://www.openstreetmap.org/copyright">OpenStreetMap</a>') %>%
  setView(-95, 40, zoom = 4.2)


#adding layers to map
mapUSA %>% addCircles (data=data, lat= ~latitude, lng = ~longitude, 
                       popup=paste(
                         "<strong>Year: </strong>", data$iyear,
                         "<br><strong>City: </strong>", data$city, 
                         "<br><strong>Country: </strong>", data$country_txt, 
                         "<br><strong>Attack type: </strong>", data$attacktype1_txt, 
                         "<br><strong>Target: </strong>", data$targtype1_txt, 
                         " | ", data$targsubtype1_txt, 
                         " | ", data$target1, 
                         "<br><strong>Weapon: </strong>", data$weaptype1_txt, 
                         "<br><strong>Group: </strong>", data$gname, 
                         "<br><strong>Motive: </strong>", data$motive, 
                         "<br><strong>Summary: </strong>", data$summary),
                       weight = 0.9, color="#8B1A1A", stroke = TRUE, fillOpacity = 0.6)
## Warning in validateCoords(lng, lat, funcName): Data contains 1 rows with
## either missing or invalid lat/lon values and will be ignored

1A. Killings from Terrorist Attacks on USA between 1970 - 2017 by Year and State

Looking at the statewise split, we see that there are few states where the number of killed citizens in terrorist attacks is very large. Some of these states have experienced recent massive increase, like Nevada and Florida while others have a long history, like New York.  We also notice a lot of people lost their lives in New York in 2001, suggesting a major terrorist incident. To hypothesize we can say this is the result of 9/11 attacks in New York in 2001. We will dig further to check if our hypothesis is correct, later in this project.

kills = data %>% filter(nkill > 0)  #subset data where nkill is greater than 0.
#Killings yearwise
treemap(kills, 
        index=c("iyear"), 
        vSize = "nkill",  
        palette = "Reds",  
        title="Killings in USA Terrorism by year", 
        fontsize.title = 14 
)

#Killings statewise
treemap(kills, 
        index=c("provstate"), 
        vSize = "nkill",  
        palette = "Reds",  
        title="Killings in USA Terrorism by state", 
        fontsize.title = 14 
)

1B. Killings from Terrrorist Attacks on USA between 2007 - 2017 by Year and State

Let’s look at the killings from terror activities in the last decade. In the following treemap, the size of the areas corresponding to each year is proportional with the number of kills in that year in the terrorist activities. We can easily see that there was a massive increase in killings in terrorist activities in the years from 2015 and in the last 3 years (2015-2017) the volume was significantly higher than in the previous years.

kills = data %>% filter(nkill > 0, iyear > 2007)  #subset data where nkill is greater than 0.

#Killings yearwise
treemap(kills, 
        index=c("iyear"), 
        vSize = "nkill",  
        palette = "Reds",  
        title="Killings in USA Terrorism by year in last decade", 
        fontsize.title = 14 
)

#Killings statewise
treemap(kills, 
        index=c("provstate"), 
        vSize = "nkill",  
        palette = "Reds",  
        title="Killings in USA Terrorism by state in last decade", 
        fontsize.title = 14 
)

2A. Terror Attacks on USA between 1970-2017 by Attack Type

Let’s inspect the evolution of events in time, grouped by type of attack perpetrated.

#Selecting data by grouping year & attackType
AttackType = data %>% 
  group_by(iyear,attacktype1_txt) %>% 
  summarise(n = length(iyear)) %>% 
  ungroup()


#Assigning Column Names for subset of data
colnames(AttackType) = c("Year","Type of attack","Number of events")


#plotting total number of incidences(events) by year based on type of attack
ggplot(data = AttackType, aes(x = Year, y = `Number of events`, colour = `Type of attack`)) + 
  geom_line() + 
  geom_point() + 
  theme_bw()

#plotting number of incidences by attack tpye
ggplot(data, aes(x = iyear)) +
  labs(title =" Terrorist attacks in US between 1970-2017 by attack type", x = "Years", y = "Number of Attacks", size = 15) +
  geom_bar(colour = "grey19", fill = "tomato3") + 
  facet_wrap(~attacktype1_txt,scales = "free", ncol = 3) + 
  theme(axis.text.x = element_text(hjust = 1, size = 12, angle = 45)) + 
  theme(strip.text = element_text(size = 10, face = "bold"))

2B. Top 6 states with highest terror attack frequency

Lets drill down to see which are the top 6 states in USA facing higest number of attacks and what type of attacks. Between 1970 and 2017, we see that California, New York, Puerto Rico, Florida, Illinois, Washington experienced the highest frequency of attacks.
The TreeMap below also shows frequency of these attacks grouped by states. Zoom in and out of the map to view Country Level View, State Level View or City Level View. Click on the markers on the map for more information.

#Selecting data by grouping year & attackType and state
AttackTypeState = data %>% 
  group_by(iyear,attacktype1_txt, provstate) %>% 
  summarise(n = length(iyear))


#Assigning Column Names for subset of data
colnames(AttackTypeState) = c("Year","Type of attack","State","Number of events")


#filtering data to get top 6 states with highest number of incidences 
top6 = AttackTypeState %>% 
  group_by(State) %>% 
  summarise(n = sum(`Number of events`)) %>% 
  arrange(desc(n)) %>%
  top_n(6)


#extracting State names from filtered data
top6states = as.factor(top6$State) 
top6states
## [1] California  New York    Puerto Rico Florida     Illinois    Washington 
## 55 Levels:  Alabama Alaska Arizona Arkansas California ... Wyoming
#filter our actual dataset to get information on incidences belonging to top6 states
AttackTypeState = filter(AttackTypeState, `State` %in% top6states)


#plotting total number of incidences(events) in top 6 states by year based on type of attack
ggplot(data = AttackTypeState, aes(x = Year, y = `Number of events`, group = `Type of attack`, colour = `Type of attack`)) + 
  geom_line() + 
  facet_wrap(~State) + 
  theme_bw() + 
  theme(axis.text.x = element_text(angle=45))

#plotting map of total number of incidences(events) in states of USA.
leaflet(data = data) %>%
  addTiles('http://{s}.basemaps.cartocdn.com/dark_all/{z}/{x}/{y}.png',
           attribution='&copy; 
           <a href="http://www.openstreetmap.org/copyright">OpenStreetMap</a>') %>%
  setView(-95, 40, zoom = 4.2) %>%
  addMarkers(lat=data$latitude, lng=data$longitude, clusterOptions = markerClusterOptions(),
             popup= paste("<strong>Date: </strong>", data$iday,"/",data$imonth,"/", data$iyear,
                          "<br><br><strong>Place: </strong>", data$city,"-",data$country_txt,
                          "<br><strong>Killed: </strong>", data$nkill,
                          "<br><strong>Wounded: </strong>", data$nwound
             ))

3A. Attacks by Terror Groups

The table below shows number of incidences along with number of killings, and number of wounded citizens grouped by Terror group names. We notice a clear outlier yet a very important outlier highlighting 9/11 attacks. Al-Qaida shows 4 incidences with over 3000 kills and over 16,000 wounded.

select(data, iyear, nkill, nwound, gname)  %>% 
  group_by(gname) %>% 
  summarise("#Incidences" = n(), "#Kills" = sum(nkill), "#Wounded" = sum(nwound))
## # A tibble: 234 x 4
##    gname                                  `#Incidences` `#Kills` `#Wounded`
##    <fct>                                          <int>    <int>      <int>
##  1 Action Squad                                       1        0          0
##  2 African-American extremists                        1        0          0
##  3 Al-Qaida                                           4     3001      16493
##  4 Al-Qaida in the Arabian Peninsula (AQ~             1        0          2
##  5 American Indian Movement                           6        2          2
##  6 American Servicemen's Union (ASU)                  3        0          0
##  7 Americans for a Competent Federal Jud~             5       NA         NA
##  8 Americans for Justice                              2        0          0
##  9 Anarchists                                         3        0          0
## 10 Animal Liberation Front (ALF)                     76        0          2
## # ... with 224 more rows

3B. Attacks by Terror Groups and AttackType

The table below shows number of incidences along with number of killings, and number of wounded citizens grouped by Terror group names. It also groups these incidences based on the type of attacks. The important outlier that we saw in the previous table shows that the type of attack was Hijacking. Notice how this finding confirms our hypothesis in 1A.
It shows that New York in 2001 was attacked by Al-Qaida, who hijacked planes and crashed them, resulting in over 3000 deaths and over 16,000 wounded.

select(data, iyear, attacktype1_txt, nkill, nwound, gname)  %>% 
  group_by(gname, attacktype1_txt) %>% 
  summarise("#Incidences" = n(), "#Kills" = sum(nkill), "#Wounded" = sum(nwound))
## # A tibble: 416 x 5
## # Groups:   gname [234]
##    gname             attacktype1_txt      `#Incidences` `#Kills` `#Wounded`
##    <fct>             <fct>                        <int>    <int>      <int>
##  1 Action Squad      Facility/Infrastruc~             1        0          0
##  2 African-American~ Hostage Taking (Kid~             1        0          0
##  3 Al-Qaida          Hijacking                        4     3001      16493
##  4 Al-Qaida in the ~ Bombing/Explosion                1        0          2
##  5 American Indian ~ Bombing/Explosion                5        0          0
##  6 American Indian ~ Hostage Taking (Bar~             1        2          2
##  7 American Service~ Bombing/Explosion                3        0          0
##  8 Americans for a ~ Assassination                    1        1          0
##  9 Americans for a ~ Bombing/Explosion                4       NA         NA
## 10 Americans for Ju~ Bombing/Explosion                2        0          0
## # ... with 406 more rows

4. Attacks by TARGET Type

Lets change our view, and look at these attacks grouped by Target Type. We see a surprising insight that Journalist & Media were the most attacked targets since 1980!

#removing null target types
dataClean = data[which(data$targsubtype2_txt !='.'), ] 


#Plotting attacks by target type
ggplot(dataClean, aes(x = iyear))+ labs(title =" Terrorist attacks on India between 1970-2015 by TARGET type", x = "Years", y = "Number of Attacks") + 
  geom_bar(colour = "grey19", fill = "tomato3") + facet_wrap(~targtype2_txt, ncol = 4) + theme(axis.text.x = element_text(hjust = 1, angle = 45))+
  theme(strip.text = element_text(size = 11, face = "bold"))

#plotting yearly attacks by target type
ggplot(data=dataClean, aes(x=iyear,fill=targtype2_txt)) + geom_bar() + ggtitle("Yearly terrorist attacks by TARGET type")+         
    labs(x = "Years", y = "Number of Attacks")

5. Attacks by WEAPON Type

Let’s change our view again, and look at these attacks grouped by Weapon Type.
We see a very high peek of Explosives during 1970 - 1980. On further research, it was found that, in a single eighteen-month period during 1971 and 1972 the FBI counted an amazing 2,500 bombings on American soil, almost five a day. Because they were typically detonated late at night, few caused serious injury, but resulted in great turmoil.

#plotting attacks by weapon type
ggplot(data, aes(x = iyear))+ labs(title =" Terrorist attacks on USA between 1970-2015 by WEAPON type", x = "Years", y = "Number of Attacks") + 
  geom_bar(colour = "grey19", fill = "tomato3") + 
  facet_wrap(~weaptype1_txt, ncol = 2) + theme(axis.text.x = element_text(hjust = 1, angle = 45))+ theme(strip.text = element_text(size = 11, face = "bold"))

#plotting yearly attacks by weapon type 
ggplot(data=data, aes(x=iyear,fill=weaptype1_txt)) + 
    geom_bar() + ggtitle("Yearly terrorist attacks by WEAPON type")+ 
    labs(x = "Years", y = "Number of Attacks")

Summary

This data visualization/story telling using R is an exploratory analysis of Global Terrorism Database, focussed specially on United States of America for scaling purpose. The following points give a brief recapitulation of the findings throughout the notebook:

In the field of global security, Big Data analytics is aiming at pre-emption (stopping an attack at the early stage, say via communications analysis and following purchases of dangerous/suspicious materials) and prevention (interfering before violent action is carried out, say via network disruption or through identifying the person at risk).
The use of Big Data can help antedate future threats and allow counter-terrorism forces to more efficiently deploy the limited resources to concentrate on the grotesque and more immediate threats. However, there are still so many problems in counter terrorism which are worth researching in the future.